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Model-Based  Robot  Learning 

Christopher  G.  Atkeson,  Eric  W.  Aboaf, 
Joseph  McIntyre  and  David  J.  Reinkensmeyer 


1  Introduction 

An  important  component  of  human  motor  skill  is  the  abil¬ 
ity  to  improve  performance  by  practicing  a  task. 
Commands  are  refined  on  the  basis  of  performance  er¬ 
rors.  It  is  often  suggested  that  such  learning  reduces  the 
need  for  an  accurate  internal  model,  a  model  of  the  me¬ 
chanical  plant  in  the  control  system  (tee  Arimoto,  1984b; 
Wang  and  Horowitz,  1985;  and  Harokopot,  1986  for  ex¬ 
amples).  This  is  not  the  case.  Internal  models  play  an 
important  role  in  generating  command  corrections  from 
performance  errors.  At  an  internal  model  it  made  more 
accurate,  learning  efficiency  is  improved,  as  it  initial  per¬ 
formance. 

This  paper  will  show,  in  a  series  of  examples,  how 
internal  models  can  be  used  as  learning  operators.  The 
examples  are  1)  positioning  a  limb  at  a  visual  target,  2) 
throwing  a  ball  at  a  target,  and  3)  following  a  defined 
trajectory.  The  essence  of  the  model-bated  learning  algo¬ 
rithms  used  to  improve  performance  on  these  tasks  it  that 
internal  models  are  used  to  transform  performance  errors 
into  command  corrections. 

The  type  of  learning  described  in  this  paper  -  refining 
commands  on  the  basis  of  practice  -  complements  many 
other  types  of  adaptive  processes.  Feedback  controller  de¬ 
signs  can  be  improved  by  adaptive  control  algorithms.  In¬ 
ternal  models  can  be  incrementally  improved  using  system 
identification  techniques.  Trajectories  can  be  optimized 
for  particular  tasks.  Robot  plans  and  programs  can  be 
debugged  as  errors  are  discovered  daring  execution.  This 
paper  focuses  on  improving  execution  of  a  given  task  plan 
by  refining  the  commands  given  to  the  robot. 


Model-Based  Learning  Algorithm  Struc¬ 
ture 

The  model-based  learning  algorithms  described  here  all 
have  the  tame  form.  Commands  are  refined  on  the  basis 
of  performance  errors.  A  command  it  applied  to  the  con¬ 
trolled  system  (Figure  1  A).  Performance  errors  may  result 
from  errors  in  the  conunand.  A  model  of  the  inverse  of 
the  controlled  system  is  used  to  estimate  the  errors  in  the 
command  based  on  the  measured  performance  or  output 
errors  (Figure  IB).  If  the  inverse  model  of  the  controlled 
system  is  perfect,  the  command  errors  would  be  correctly 
estimated  and  completely  eliminated  after  one  attempt  at 
performing  the  task.  (Of  course,  if  a  perfect  model  of  the 
controlled  system  is  available  then  the  initial  command 
would  also  have  been  perfect).  Perfect  knowledge  of  the 
controlled  system  is  not  usually  available,  and  the  model 
of  the  inverse  of  the  controlled  system  will  be  incorrect. 
Due  to  the  modeling  errors,  the  command  correction  will 
be  incomplete,  and  learning  will  be  an  iterative  process  of 
refining  the  command. 

There  are  three  steps  to  the  learning  algorithms:  com¬ 
mand  initialization,  execution,  and  modification.  The  ini¬ 
tial  command  is  generated  by  applying  the  inverse  model 
of  the  controlled  system  to  the  desired  performance.  Dur¬ 
ing  execution,  a  command  is  applied  to  the  system  and 
the  actual  performance  is  monitored.  The  command  cor¬ 
rection  is  calculated  by  applying  the  inverse  model  to  the 
performance  errors.  The  refined  command  is  now  exe¬ 
cuted.  The  cycle  of  command  execution  and  modification 
is  repeated  until  desired  performance  is  achieved. 
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Figure  1:  The  inverse  of  the  controlled  system  is  used  to  estimate  command  errors  from  performance  errors. 


2  A  Kinematic  Example 

The  task  of  positioning  the  limb  at  a  visual  target  will  be 
used  to  provide  a  specific  example  of  how  model-based 
learning  works.  A  robot  arm  and  a  target  are  viewed 
by  a  vision  system  (Figure  2).  The  robot  arm  servos 
to  a  commanded  set  of  joint  angles,  #,  and  the  vis'on 
system  measures  the  tip  position,  x,  in  vision  s)r8tem 
coordinates.  The  controlled  system  in  this  case  trans¬ 
forms  commanded  Joint  angles  into  a  measured  tip  posi¬ 
tion  (Figure  lA): 

x  =  L(i)  (1) 

The  forward  kinematics,  L{),  is  in  general  a  nonlinear 
transformation.  For  the  purposes  of  this  example  we 
will  assume  there  are  no  singularities  or  redundancies  to 
resolve  in  the  field  of  view  of  the  vision  system.  For  each 
desired  tip  position  there  is  one  and  only  one  appropriate 
set  of  joint  angles. 

A  model  of  the  inverse  kinematics  is  used  to  transform 
the  desired  tip  position,  x^,  into  an  initial  joint  angle 
command,  in  the  command  imtialization  stage; 

#°  =  L-‘(x.)  (2) 

A  caret  (')  is  used  to  indicate  a  model  or  an  estimate  of 
a  quantity.  The  initial  joint  angle  command  is  applied 
in  the  first  execution  stage,  and  the  corresponding  tip 
position  is  measured; 

x°  =  L[lP)  (3) 

The  true  system,  L(),  and  its  inverse  are  unknown,  and 
only  imperfect  models  are  available.  Due  to  modeling 
errors,  the  actual  tip  position,  x°,  will  not  match  the 
desired  tip  position,  Xj. 

At  this  point  we  must  decide  how  to  transform  the 
measured  tip  position  error  into  a  correction  to  the  set 
of  commanded  joint  angles.  Performance  errors  must  be 
mapped  into  command  corrections.  The  same  model  of 
the  inverse  kinematics  that  was  used  to  generate  the  ini¬ 
tial  command,  will  be  used  to  estimate  the  com¬ 

mand  error  (Figure  IB). 

The  command  error,  6i,  is  the  difference  between 
the  currently  commanded  joint  angles,  t°,  and  the  (un¬ 
known)  correct  set  of  joint  angles,  which  will  be  indicated 
as  #*.  The  command  error  can  be  computed  in  terms 


of  the  actual  and  desired  performances  using  the  true 
system  inverse; 

6t°  =  «^-r  =  L-^ix°)-L-^Xi)  (4) 

As  we  do  not  have  perfect  knowledge  of  the  true  system 
inverse,  we  must  use  a  model  of  the  system  inverse  to 
estimate  the  command  error; 

6lf‘  =  L->(xO)  -  L-»(x,)  (5) 

The  command  is  updated  by  simply  subtracting  the  esti¬ 
mate  of  the  command  error  from  the  previous  command; 

#»  =  #“-  (6) 

If  the  model  of  the  system  inverse  was  perfect  the 
command  error  would  be  estimated  correctly  and  com¬ 
pletely  eliminated  on  the  next  attempt.  However,  a 
model  is  rarely  perfect,  so  command  correction  ust  be 
an  iterative  process  of  estimating  a  command  error  using 
an  imperfect  model,  removing  the  estimated  command 
error,  applying  the  refined  command,  and  using  the  re¬ 
sulting  performance  error  and  the  model  to  estimate  re¬ 
maining  errors  in  the  command.  Equations  (3),  (5),  and 
(6)  can  be  indexed  with  t  to  indicate  that  they  are  ap¬ 
plied  on  each  practice  attempt,  reflecting  the  iterative 
nature  of  the  algorithm; 

1.  Command  initialisation: 


•®  =  i-‘(x.) 

(7) 

2.  Command  execution: 

X*  =  L{r) 

(8) 

3.  Command  error  estimation; 

Si'  =  L-^x*)  -  L-^{x^) 

(9) 

4.  Conunand  modification; 

r+‘  =  r  -  Si* 

(10) 

Steps  2,  3,  and  4  are  repeated  until  satisfactory 
performance  is  achieved. 
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Figure  2:  A  robot  arm  and  a  target  are  viewed  by  a 
vision  system. 

Convergence 

The  quiility  of  the  inverse  model  used  as  the  ieaming 
operator  determines  how  fast  model-based  leanung  con¬ 
verges.  Fixed  point  theory  can  be  used  to  analyse  the 
general  nonlinear  case  (Wang  1984,  Wang  and  Horowitz 
1985).  A  learning  algorithm  can  be  viewed  as  a  map¬ 
ping  of  commands  on  the  tth  attempt  to  commands  on 
the  next  attempt: 

=  F{t*)  (11) 

The  previously  described  algorithm  can  be  put  into  this 
form  by  substituting  equation  (8)-  into  (9)  and  (9)  into 
(10).  The  modei-based  learning  algorithm  modifies  the 
tth  command  by  adding  a  correction  based  on  the  per¬ 
formance  error  transformed  by  the  inverse  model: 

r>‘  =  ^--(L->(i(r))-I-Mx.))  (12) 

Note  that  when  the  desired  performance,  x^,  is  achieved 
using  the  correct  command,  0’,  then  L(0')  =  X4  and 
equation  (12)  reduces  to  the  fixed  point  =  d*  =  #*. 

We  can  ask  whether  this  fixed  point  is  stable  by 
analyzing  a  linearization  of  equation  (12)  at  the  point 
(f,x)  =  (#*,Xi).  For  a  small  perturbation  80  from  the 
fixed  point, 

L(0’  +  60)  ^X4  +  J(0’)S0  (13) 

where  J  is  the  Jacobean  matrix  of  derivatives  of  L{). 
Similarly,  for  a  small  perturbation  6x  from  the  fixed 
point, 

+  fix)  =  L~^[Xi)  +  J~‘(x4)fix  (14) 

where  is  the  Jacobean  matrix  for  the  inverse  model 
If  on  the  tth  trial  the  command  is  perturbed 
from  #*  by  60*  so  that  #*  =  #*  +  60*,  the  error  in  the 
next  command,  60^*^  =  0***  —  #*,  can  be  computed  by 
substituting  equations  (13)  and  (14)  into  equation  (12): 


60***  =  (1  -  J-*{x4)J(0'))60*  (15) 

If  J~*  is  a  correct  inverse  of  J  the  command  error  will 
be  completely  corrected  after  one  attempt,  in  the  linear 
case.  The  command  error  60  will  decrease  when  all  of 
the  eigenvalues  of  the  matrix  {1~J~*J)  are  less  than  one 
in  absolute  value,  with  the  rate  of  decrease  determined 
by  the  magnitude  of  the  eigenvalues.  If  the  magnitude  of 
any  eigenvalue  is  greater  than  one,  the  learning  process 
will  be  unstable  and  performance  degraded  rather  than 
improved  by  learning.  The  magnitude  of  the  eigenvalues 
of  (1  —  J~*J)  depend  on  how  accurately  J~*  inverts  J, 
and  thus  the  convergence  rate  of  the  learning  algorithm 
depends  on  how  closely  the  learning  operator  inverts  the 
controlled  system. 

Input  vs.  output  disturbance  estimation 

Although  our  performance  errors  are  due  to  errors  in 
modeling  the  controlled  system,  the  model-based  learn¬ 
ing  algorithm  was  derived  by  assuming  that  an  unknown 
error  was  added  to  the  command.  In  the  kinematic  tip 
positioning  example  a  constant  command  disturbance 
would  correspond  to  constant  joint  angle  offsets  added 
to  the  commanded  joint  angles.  The  learning  algorithm 
just  described  can  be  viewed  as  an  iterative  procedure 
to  estimate  a  command  disturbance. 

An  alternative  version  of  the  model-based  learning  al¬ 
gorithm  is  suggested  by  assuming  that  the  major  source 
of  errors  are  output  (performance)  disturbances  rather 
than  input  (command)  disturbances.  In  the  kinematic 
exmnple  just  presented,  the  camera  measuring  tip  posi¬ 
tion  could  have  an  unknown  offset,  A.  This  offset  could 
initially  be  assumed  to  be  zero,  and  after  each  position¬ 
ing  attempt  an  estimate  of  the  offset  could  be  refined  by 
subtracting  the  tip  position  error: 

A*  =  A*-*  -  (x'-*  -  xa)  (16) 

The  estimated  output  offset  would  be  added  to  the  de¬ 
sired  tip  position  when  the  next  joint  angle  command 
was  computed; 

r  =  Z,'‘(XJ-I- A‘)  (17) 

Equations  (16)  and  (17)  replace  equations  (9)  and  (10)  in 
the  input  disturbance  version  of  the  model-based  learn¬ 
ing  algorithm  to  form  the  output  disturbance  version. 

Representing  possible  modeling  errors  as  either  in¬ 
put  or  output  errors  is  a  modeling  decision  that  depends 
on  the  assumed  source  of  the  modeling  errors.  In  the 
output  disturbance  version  of  the  model-based  learning 
algorithm,  as  in  the  input  disturbance  version,  the  per¬ 
formance  error  is  mapped  through  an  inverse  model  of 
the  controlled  system  to  calculate  a  command  correction. 
The  output  disturbance  model-based  learning  algorithm 
has  similar  convergence  properties  as  the  input  distur¬ 
bance  algorithm. 
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Figure  3:  The  throwing  task. 


target  plate 


3  Learning  to  Throw 

Model-based  learning  can  be  used  to  improve  perfor¬ 
mance  on  a  complete  task,  in  addition  to  improving  po¬ 
sitioning.  As  an  example  of  task  heyel  learning,  a  robot 
arm  was  programmed  to  throw  balls  at  a  target.  The 
robot  throwing  accuracy  improved  with  practice. 

Figure  3  illustrates  the  apparatus  used  in  the  throw¬ 
ing  experiments.  The  target  was  at  the  center  of  a  large 
metal  plate,  which  was  placed  approximately  5  meters 
from  the  base  of  the  robot.  For  this  throwing  task  only 
the  height  of  the  ball  when  it  hit  the  target  plate  was 
monitored  and  improved  by  a  learning  algorithm. 

The  last  link  of  a  three  joint  direct  drive  arm  was 
used  as  a  catapult  to  throw  a  ball.  The  robot  was  po¬ 
sitioned  so  that  the  last  link  of  the  arm  rotated  in  a 
vertical  plane.  The  last  joint  was  servoed  to  a  fifth  or¬ 
der  polynomial  trajectory  that  began  at  rest  at  225*  and 
ended  at  rest  at  45*.  A  4em  diameter  rubber  ball  was 
placed  onto  a  3.5em  diameter  hole  at  the  end  of  the  last 
link.  The  bail  left  the  hole  as  the  robot  arm  decelerated 
during  the  throw.  No  release  mechanism  was  used.  The 
release  position  of  the  bail  was  assumed  to  be  when  the 
last  link  was  at  135*.  The  distance  the  ball  was  thrown 
was  controlled  by  changing  the  duration  of  the  throw¬ 
ing  movement,  which  changed  the  release  velocity.  A 
shorter  duration  and  therefore  faster  movement  threw 
the  ball  higher  and  further,  and  a  longer  duration  move¬ 
ment  threw  the  ball  lower  and  closer. 

A  video  camera  was  used  to  record  where  the  ball  hit 
the  target  plate.  The  impact  of  the  ball  was  sensed  by 


a  force  sensor  on  which  the  target  plate  was  mounted. 
This  signal  was  used  to  choose  video  frames  to  be  stored 
for  latOT  analysis.  After  the  throw,  the  location  of  the 
ball  on  the  target  plate  was  manually  measured  from  the 
appropriate  video  frame. 

The  initial  release  velocity  command  was  calculated 
by  measuring  the  distance  to  the  target  and  using  a  sim¬ 
ple  ballistics  model,  incorporating  only  gravity,  to  pre¬ 
dict  the  required  flight  trajectory  given  the  assumed  re¬ 
lease  position  and  initial  direction  of  ball  flight.  The  cor¬ 
responding  trajectory  duration  was  computed  and  the 
calculated  trajectory  executed.  On  the  first  throw  the 
ball  hit  the  target  plate  28cm  above  the  target.  The 
model-based  learning  algorithm  based  on  estimating  an 
output  offset  (equations  (16)  and  (17))  was  tised  to  im¬ 
prove  performance  on  the  throwing  task.  This  output 
offset  learning  algorithm  corresponds  to  our  intuition 
that  we  should  lum  lower  if  we  are  hitting  too  high,  and 
vice  versa.  The  role  of  the  internal  model  is  to  calcu¬ 
late  how  much  the  aim  should  be  changed.  The  bal¬ 
listics  model  used  to  generate  initial  performance  was 
also  used  to  calculate  the  appropriate  release  velocity  as 
the  aim  was  offset  by  the  estimated  disturbance  amount. 
The  open  squares  in  Figure  4  show  the  throwing  perfor¬ 
mance  during  model-based  learning.  In  this  particular 
experiment  the  ball  hit  the  target  on  the  eighth  throw. 

The  open  triangles  in  Figure  4  indicate  the  perfor¬ 
mance  of  a  model-based  learning  algorithm  that  improves 
the  model  as  well  as  refining  the  command.  This  algo¬ 
rithm  will  be  discussed  in  a  later  paper. 
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4  Trajectory  Learning 

Tr^ectoiy  execution  of  a  robot  can  be  improved  using  a 
model-based  learning  algorithm  (Atkeaon  and  McIntyre 
1986a,  1986b).  A  model  of  the  robot  inverse  dynatmcs  is 
used  as  the  learning  operator  that  transforms  trajectory 
following  errors  into  feedforward  command  corrections. 
This  form  of  learning  is  useful  for  refining  repetitive  mo¬ 
tions,  and  can  also  be  used  to  refine  groups  of  similar  mo¬ 
tions.  Model-based  trajectory  learning  was  implemented 
on  the  Mrr  Serial  Link  Direct  Drive  Arm  and  greatly 
reduced  trajectory  following  errors  in  a  small  number  of 
practice  movements. 

The  robot  model  used  as  the  learning  operator  in 
the  trajectory  learning  experiments  was  identified  &om 
movements  of  the  MIT  Serial  Link  Direct  Drive  Robot 
Arm  (Atkeson,  An,  and  Hollerbach  1986).  The  dynamics 
of  this  direct  drive  robot  arm  are  dominated  by  rigid 
body  dynamics,  so  a  Newton-Euler  model  structure  was 
used.  The  Newton-Euler  ripd  body  dynamics  equations 
for  a  robot  can  be  written  as 

r  =  R->(#, #, »)  =  1(g)  •#-(-#■  C(#)  -  #  +  g(#)  (18) 

where  g(t)  is  the  desired  trajectory  of  the  joint  angles, 
r(()  is  the  vector  of  required  torques  to  achieve  the  de¬ 
sired  trajectory,  !(#)  is  the  inertia  matrix  of  the  arm, 
C(g)  is  the  Coriolis  and  centripetal  force  tensor,  and 
g(g)  is  the  gravitational  force  vector  (Hollerbach,  1984). 
For  other  types  of  robots  it  is  argued  that  additional 
sources  of  dynamics  are  important  (Goor,  1985;  Good, 
Sweet,  and  Strobel,  1985).  In  these  cases  we  can  still 
model  the  robot  dynamics  and  invert  the  model. 

As  before,  there  are  several  stages  of  the  algorithm. 
The  initial  feedforward  command  is  generated  by  ap¬ 
plying  the  model  of  the  robot  inverse  dynamics  to  the 
desired  trajectory  (as  in  equation  (7)): 

r?,(o  =  A-‘(#s(o>s(o.>-(0)  m 


During  command  execution  the  applied  command  is 
the  sum  of  the  feedforward  command,  rff,  and  the  out¬ 
put  of  the  feedback  controller, 

>■‘(0  =  f>,(t)  +  r)t(0  (20) 

The  total  applied  command,  r,  is  used  as  the  basis 
for  the  next  feedforward  conunand.  As  described  in  the 
previous  sections,  the  command  error  is  estimated  using 
the  model  of  the  robot  inverse  dynamics  (as  in  equation 

(9)): 

6^\t)  =  R-\r(t),§\t),i\t))  - 

(21) 

and  the  next  feedforward  command  is  the  modified  total 
command  (as  in  equation  (10)); 

r*y(t)=r‘(t)-^‘(t)  (22) 

Other  Approaches  to  Trajectory  Learn¬ 
ing 

Recent  work  in  a  number  of  laboratories  has  focused  on 
how  to  refine  feedforward  commands  for  repetitive  move¬ 
ments  on  the  basis  of  previous  movement  errors.  Work 
on  repeated  trajectory  learning  includes  (Arimoto  et  al 
1984,  1985;  Casalino  tc  Gambardella  1986;  Craig  1984; 
Furuta  it  Yamakita  1986;  Hara  et  al  1985;  Harokopos 
1986;  Mita  ft  Kato  1985;  Morita  1986;  Togai  ti  Yamano 
1986;  Uchlyama  1978;  Wang  1984;  Wang  ti  Horowitz 
1985).  These  papers  discuss  only  linear  learning  oper¬ 
ators  and  emphanse  the  stability  of  the  proposed  algo¬ 
rithms.  There  has  been  little  work  emphasising  perfor¬ 
mance,  i.e.  the  convergence  rate  of  the  algorithm.  Simu¬ 
lations  of  several  of  these  algorithms  have  revealed  very 
slow  convergence  and  large  sensitivity  to  disturbances 
and  sensor  and  actuator  noise  (C.  G.  Atkeson,  unpub¬ 
lished  results). 

An  Implementation  of  the  Trajectory 
Learning  Algorithm 

The  model-based  trsjectory  learning  algorithm  has  been 
implemented  on  the  MIT  Serial  Link  Direct  Drive  Arm 
(Atkeson  and  McIntyre  1986a,  1986b).  This  three  joint 
arm  is  described  in  (Atkeson,  An,  and  Hollerbach  1986). 
To  explore  the  effectiveness  of  the  model- based  trajec¬ 
tory  learning  algorithm  we  will  present  results  on  learn¬ 
ing  a  particular  trajectory. 

The  Test  Trajectory:  All  three  joints  of  the  Direct 
Drive  Arm  were  commanded  to  follow  a  fifth  order  poly¬ 
nomial  trajectory  with  zero  initial  and  final  velocities  and 
accelerations  and  a  1.5  second  duration.  Figure  5  shows 
the  shape  of  the  trajectory  for  each  joint,  and  Table  1 
gives  the  initial  and  final  joint  positions,  the  peak  joint 
velocities,  and  the  peak  joint  accelerations. 
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Position 


0.5  seconds 


Figure  5:  The  test  trajectory. 


Joint 

Initial 

Position 

radians 

Final 

Position 

radians 

Peak 
Velocity 
radians  js 

Peak 
Acceleration 
radians  /  s* 

1 

0.5 

5.0 

2 

5.0 

-5.0 

3 

4.0 

-5.6 

±11.5 

Table  1:  Test  trajectory  parameters. 


The  Feedback  Controller:  An  independent  digital 
feedback  controller  was  implemented  for  each  joint  and 
was  not  modified  during  learning. 

Initialization  Of  The  Feedforward  Command: 
The  initial  feedforward  torques  were  generated  from  a 
rigid  body  dynamics  model.  The  model  and  the  estima¬ 
tion  of  its  parameters  are  described  in  (Atkeson,  An,  and 
Hollerbach,  1986).  The  calculated  feedforward  torques 
are  shown  in  Figure  6A. 

Initial  Trajectory  Performance:  As  an  index  of 
trajectory  following  performance  the  velocity  errors  (the 
difference  between  the  actual  joint  velocity  and  the  de¬ 
sired  joint  velocity)  for  the  first  movement  are  shown  in 
Figure  7 A.  We  have  plotted  the  raw  velocity  error  data 
to  give  an  idea  of  the  relative  size  of  the  trajectory  errors 
and  sensor  noise. 

Calculating  Acceleration  and  Filtering:  In  or¬ 
der  to  use  the  rigid  body  inverse  dynamics  model  to  com¬ 
pute  joint  torques  it  was  necessary  to  compute  the  joint 
accelerations.  Joint  positions  and  velocities  were  mea¬ 
sured  directly.  A  digital  differentiating  filter  combined 
with  an  8Hz  low  pass  filter  was  applied  to  the  velocity 
data  to  estimate  accelerations. 

To  reject  noise  and  non-repeatable  disturbances  and 
to  compensate  for  high  frequency  unmodelled  dynam¬ 
ics  it  was  necessary  to  filter  the  trajectory  errors  and 
controller  output.  In  this  implementation  we  applied 
low  pass  digital  filters  with  an  8Hz  cutoff  to  the  data 


used  in  the  learning  process.  We  filtered  the  references 
used  by  the  learning  operator  with  the  same  filter  used 
on  the  data.  It  was  also  necessary  to  correct  for  incon¬ 
sistencies  between  the  velocity  sensors  and  the  position 
measurements,  which  was  done  by  adjusting  the  position 
reference  to  the  feedback  controller  until  the  integrated 
velocity  error  matched  the  position  error. 

Final  TrsOectory  Performance:  The  robot  exe¬ 
cuted  two  additional  training  movements  which  are  not 
shown,  and  its  performance  on  the  fourth  attempt  of  the 
test  tr^ectory  was  assessed.  Figure  6B  shows  the  mod¬ 
ified  feedforward  commands  used  on  the  fourth  move¬ 
ment,  and  should  be  compared  with  the  predicted  tor¬ 
ques  shown  in  Figure  6A.  Figure  7B  shows  the  velocity 
errors  for  the  fourth  movement,  and  should  be  compared 
with  the  initial  movement  velocity  errors  in  Figure  7A. 
There  has  been  a  substantial  reduction  in  trajectory  fol¬ 
lowing  error  after  only  three  practice  movements. 

5  Issues  For  Further  Research 

Some  of  the  questions  that  warrant  further  research  in¬ 
clude  the  effect  of  modeling  errors  and  non-repeatable 
disturbances  on  convergence,  and  learning  of  non-repeti- 
tive  tasks. 

As  discussed  previously,  the  convergence  of  model- 
based  learning  algorithms  depends  on  the  quality  of  the 
model.  Accurate  models  support  efficient  learning.  Inac¬ 
curate  models  may  cause  learning  algorithms  to  degrade 
performance  rather  than  improve  it. 

Reducing  or  filtering  the  estimated  command  correc¬ 
tion  will  make  model-based  learning  more  robust  to  mod¬ 
eling  errors.  Convergence  will  be  slowed,  however.  Fur¬ 
ther  research  is  required  into  the  appropriate  tradeoff 
between  handling  modeling  errors  and  fast  convergence. 
Filtering  of  the  model-based  command  update  also  plays 
an  important  role  in  reducing  the  effect  of  non-repeatable 
disturbances. 

If  intermediate  sensory  signals  are  available,  then 
breaking  the  control  system  into  modules  and  having 
each  module  learn  independently  may  improve  learning 
performance.  We  plan  to  explore  this  issue  in  the  throw¬ 
ing  task.  If  measurements  are  available  of  when  and 
where  the  ball  is  released,  then  independent  models  of 
the  throwing  motion  and  the  ball  fiight  characteristics 
can  be  made.  These  independent  models  can  be  used  to 
choose  an  appropriate  release  velocity  separately  from 
refining  the  trajectory  that  attains  that  release  velocity. 

It  is  possible  to  modify  models  as  well  as  commands 
during  learning.  In  the  examples  presented  in  this  pa¬ 
per  the  same  models  were  used  repeatedly  even  after  it 
became  clear  during  learning  that  the  models  had  large 
errors.  We  have  explored  some  methods  of  model  refine¬ 
ment  during  practice.  The  open  triangles  of  Figure  4 
show  the  faster  convergence  of  a  model-based  learning 
algorithm  that  improves  the  model  as  well  as  the  com¬ 
mand. 
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A.  Initial  Movement  B.  After  3  Practice  Trials 

Figure  6:  Feedforward  Torques 


B.  After  3  Practice  Trials 
Figure  7;  Velocity  Errors 


A.  Initial  Movement 
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The  model-based  learning  algorithms  are  ideally  suit¬ 
ed  to  refining  repetitive  comnoands  for  the  same  tasks. 
The  learning  algorithms  can  also  be  applied  to  refining 
commands  for  different  tasks  by  assuming  that  similar 
command  errors  will  be  made  on  similar  tasks.  An  es¬ 
timate  of  the  command  error  on  one  task  will  be  useful 
for  improving  the  command  for  other  tasks  that  share 
features  with  the  original  task. 

6  Conclusion 

The  main  message  of  this  paper  is  that  models  play  an 
important  role  in  leariung  from  practice.  Better  models 
lead  to  faster  correction  of  command  errors.  The  incor¬ 
poration  of  learning  in  a  control  system  is  not  a  license 
to  do  a  poor  modeling  job  of  the  controlled  system.  The 
benefits  of  accurate  modeling  are  better  performance  in 
all  aspects  of  control,  while  the  risks  of  inadequate  mod¬ 
eling  are  poor  learning  performance  or  even  degradation 
of  performance  with  practice. 

The  approach  to  robot  learning  presented  here  is 
based  on  explicit  modeling  of  the  robot  and  the  task 
being  performed.  An  inverse  model  of  the  task  is  used 
as  the  learning  operator  that  processes  the  errors.  Such 
model-based  command  refinement  algorithms  usefully 
complement  other  approaches  to  adaptive  control. 

Studying  model-based  learning  algorithms  serves  two 
purposes:  1)  to  improve  robot  performance,  and  2)  to 
increase  our  understanding  of  the  role  of  practice  and 
internal  models  in  human  motor  learning. 
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